Pearson type III distribution#
The Pearson type III distribution (SciPy: pearson3) is a continuous distribution that can be seen as a shifted and scaled Gamma distribution, but parameterized in a way that makes its skewness explicit. It’s a common choice when you want a simple, interpretable model for skewed data—especially in the classic log-Pearson III approach used in hydrology.
1) Title & classification#
Item |
Value |
|---|---|
Name |
Pearson type III ( |
Type |
Continuous |
Support |
If \(\gamma=0\): \(x \in \mathbb{R}\); if \(\gamma>0\): \(x \in [\mu - 2\sigma/\gamma,\infty)\); if \(\gamma<0\): \(x \in (-\infty,\mu - 2\sigma/\gamma]\) |
Parameter space (SciPy) |
skew \(\gamma \in \mathbb{R}\), location \(\mu \in \mathbb{R}\), scale \(\sigma>0\) |
Key idea |
A location–scale transform of a shifted Gamma with skewness fixed to \(\gamma\) |
What you’ll be able to do after this notebook#
understand
pearson3as a standardized shifted Gamma (and when it reduces to a Normal)write the PDF/CDF and compute key moments (mean/variance/skewness/kurtosis)
interpret how
skew,loc, andscalechange the shape and the finite endpointderive mean/variance and the (piecewise) log-likelihood
sample from
pearson3using NumPy only (via a Gamma sampler)use
scipy.stats.pearson3for evaluation, simulation, and fitting
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from scipy import stats
from scipy.special import gammainc, gammaincc, gammaln, ndtr, psi
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
2) Intuition & motivation#
What it models#
A Pearson type III random variable is essentially a Gamma-shaped bump that has been shifted and scaled so that its mean and variance are controlled by (loc, scale), while a single parameter skew = \gamma controls the asymmetry.
A distinctive feature is that for \(\gamma\neq 0\) the distribution has a finite endpoint on one side:
\(\gamma>0\) (right-skew): there is a hard lower bound.
\(\gamma<0\) (left-skew): there is a hard upper bound.
As \(|\gamma|\to 0\), that endpoint moves off to \(\pm\infty\) and the distribution approaches a Normal.
Typical real-world use cases#
Hydrology: the log-Pearson type III model for flood/peak-flow frequency analysis.
Environmental data: precipitation totals, pollutant concentrations (after transformations).
Reliability & lifetimes: skewed quantities where a one-sided endpoint is physically meaningful.
Finance/insurance: skewed severity models in limited contexts (often after preprocessing).
Relations to other distributions#
Gamma: Pearson type III is a (shifted/scaled) Gamma; it inherits the Gamma’s analytic tractability.
Normal: the special/limit case \(\gamma=0\) is the Normal distribution.
Pearson system: it is “Type III” in the Pearson family classified by differential equations / moment structure.
Log-Pearson III: if \(Z\sim\text{Pearson3}\) then \(X=\exp(Z)\) (or \(10^Z\)) gives a log-Pearson III model.
3) Formal definition#
SciPy’s pearson3 is designed so that the parameter skew = \gamma is literally the skewness (third standardized moment) of the standardized distribution.
Standard form (mean 0, variance 1)#
Let \(Z\sim\text{Pearson3}(\gamma)\).
If \(\gamma = 0\), then \(Z\sim\mathcal{N}(0,1)\).
If \(\gamma \neq 0\), define
\begin{aligned} \beta &= \frac{2}{\gamma},\ \alpha &= \beta^2 = \frac{4}{\gamma^2},\ \zeta &= -\frac{\alpha}{\beta} = -\beta = -\frac{2}{\gamma}. \end{aligned}
Let \(Y\sim\text{Gamma}(\alpha, 1)\) (shape \(\alpha\), scale 1), and set
\begin{aligned} Z = \zeta + \frac{1}{\beta}Y. \end{aligned}
This construction guarantees
\begin{aligned} \mathbb{E}[Z]=0,\qquad \mathrm{Var}(Z)=1,\qquad \mathrm{Skew}(Z)=\gamma. \end{aligned}
PDF#
For \(\gamma\neq 0\), the PDF of the standard form is
\begin{aligned} f_Z(z;\gamma) &= \frac{|\beta|}{\Gamma(\alpha)},[\beta(z-\zeta)]^{\alpha-1},\exp{-\beta(z-\zeta)},\mathbf{1}{\beta(z-\zeta)>0}. \end{aligned}
Equivalently, with \(u = \beta(z-\zeta)\), we have \(u\sim\text{Gamma}(\alpha,1)\) and
\begin{aligned} f_Z(z;\gamma) = |\beta|,f_{\text{Gamma}}(u;\alpha,1),\qquad u=\beta(z-\zeta). \end{aligned}
CDF#
Write \(P(\alpha,u)\) for the regularized lower incomplete gamma and \(Q(\alpha,u)\) for the regularized upper incomplete gamma:
\begin{aligned} P(\alpha,u) &= \frac{\gamma(\alpha,u)}{\Gamma(\alpha)},\ Q(\alpha,u) &= \frac{\Gamma(\alpha,u)}{\Gamma(\alpha)} = 1 - P(\alpha,u). \end{aligned}
Here \(\gamma(\alpha,u)\) denotes the lower incomplete gamma function (not the skewness parameter).
Then for \(\gamma>0\) (right-skew, lower-bounded support):
\begin{aligned} F_Z(z) = \begin{cases} 0, & z\le \zeta,\ P\bigl(\alpha,,\beta(z-\zeta)\bigr), & z>\zeta. \end{cases} \end{aligned}
For \(\gamma<0\) (left-skew, upper-bounded support):
\begin{aligned} F_Z(z) = \begin{cases} Q\bigl(\alpha,,\beta(z-\zeta)\bigr), & z<\zeta,\ 1, & z\ge \zeta. \end{cases} \end{aligned}
And for \(\gamma=0\), \(F_Z(z)=\Phi(z)\).
Location–scale form#
For general loc = \mu and scale = \sigma, define
\begin{aligned} X = \mu + \sigma Z. \end{aligned}
Then
\begin{aligned} f_X(x) &= \frac{1}{\sigma} f_Z!\left(\frac{x-\mu}{\sigma}\right),\ F_X(x) &= F_Z!\left(\frac{x-\mu}{\sigma}\right). \end{aligned}
NORM2PEARSON_TRANSITION = 1.6e-5 # matches SciPy's internal cutoff
def pearson3_standard_params(skew):
"""Return (alpha, beta, zeta) for the *standard* Pearson3(skew) when skew != 0."""
skew = float(skew)
if skew == 0.0:
raise ValueError("skew must be nonzero for (alpha, beta, zeta)")
beta = 2.0 / skew
alpha = beta**2
zeta = -alpha / beta # equals -beta
return alpha, beta, zeta
def pearson3_support(skew, loc=0.0, scale=1.0):
"""Return (lower, upper) support bounds for Pearson3(skew, loc, scale)."""
skew = float(skew)
loc = float(loc)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
if abs(skew) < NORM2PEARSON_TRANSITION:
return -np.inf, np.inf
_, _, zeta = pearson3_standard_params(skew)
boundary = loc + scale * zeta # = loc - 2*scale/skew
if skew > 0:
return boundary, np.inf
return -np.inf, boundary
def _norm_logpdf(z):
z = np.asarray(z, dtype=float)
return -0.5 * z**2 - 0.5 * np.log(2 * np.pi)
def _norm_cdf(z):
z = np.asarray(z, dtype=float)
return ndtr(z)
def pearson3_logpdf(x, skew, loc=0.0, scale=1.0):
"""Log-PDF of Pearson3(skew, loc, scale).
Notes
-----
- For |skew| near 0 we switch to the Normal approximation (like SciPy).
- For skew != 0 the support is one-sided, so points outside support get -inf.
"""
x = np.asarray(x, dtype=float)
skew = float(skew)
loc = float(loc)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
z = (x - loc) / scale
if abs(skew) < NORM2PEARSON_TRANSITION:
return _norm_logpdf(z) - np.log(scale)
alpha, beta, zeta = pearson3_standard_params(skew)
u = beta * (z - zeta) # should be > 0 inside support
logpdf = np.full_like(z, -np.inf, dtype=float)
mask_pos = u > 0
logpdf[mask_pos] = (
np.log(abs(beta))
+ (alpha - 1) * np.log(u[mask_pos])
- u[mask_pos]
- gammaln(alpha)
)
# Handle the boundary u == 0 separately to avoid 0 * log(0) -> NaN when alpha == 1
mask_zero = u == 0
if np.any(mask_zero):
if np.isclose(alpha, 1.0):
logpdf[mask_zero] = np.log(abs(beta)) - gammaln(alpha)
elif alpha < 1.0:
logpdf[mask_zero] = np.inf
return logpdf - np.log(scale)
def pearson3_pdf(x, skew, loc=0.0, scale=1.0):
return np.exp(pearson3_logpdf(x, skew, loc=loc, scale=scale))
def pearson3_cdf(x, skew, loc=0.0, scale=1.0):
"""CDF of Pearson3(skew, loc, scale)."""
x = np.asarray(x, dtype=float)
skew = float(skew)
loc = float(loc)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
z = (x - loc) / scale
if abs(skew) < NORM2PEARSON_TRANSITION:
return _norm_cdf(z)
alpha, beta, zeta = pearson3_standard_params(skew)
if skew > 0:
cdf = np.zeros_like(z)
u = beta * (z - zeta)
mask = u > 0
cdf[mask] = gammainc(alpha, u[mask])
return cdf
# skew < 0
cdf = np.ones_like(z)
u = beta * (z - zeta)
mask = u > 0 # this corresponds to z < zeta for beta < 0
cdf[mask] = gammaincc(alpha, u[mask])
return cdf
def pearson3_entropy(skew, loc=0.0, scale=1.0):
"""Differential entropy of Pearson3(skew, loc, scale)."""
skew = float(skew)
loc = float(loc)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
if abs(skew) < NORM2PEARSON_TRANSITION:
return 0.5 * np.log(2 * np.pi * np.e * scale**2)
alpha, beta, _ = pearson3_standard_params(skew)
# If Y ~ Gamma(alpha, 1), then h(Y) = alpha + log Γ(alpha) + (1-alpha) ψ(alpha)
h_gamma = alpha + gammaln(alpha) + (1 - alpha) * psi(alpha)
# Z = zeta + (1/beta)Y => h(Z) = h(Y) + log|1/beta| = h(Y) - log|beta|
# X = loc + scale*Z => h(X) = h(Z) + log(scale)
return h_gamma - np.log(abs(beta)) + np.log(scale)
def pearson3_mgf(t, skew, loc=0.0, scale=1.0):
"""MGF M_X(t). Returns +inf outside its domain when skew != 0."""
t = np.asarray(t, dtype=float)
skew = float(skew)
loc = float(loc)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
if abs(skew) < NORM2PEARSON_TRANSITION:
return np.exp(loc * t + 0.5 * (scale * t) ** 2)
alpha, beta, zeta = pearson3_standard_params(skew)
u = 1.0 - (scale * t) / beta
out = np.full_like(t, np.inf, dtype=float)
mask = u > 0
out[mask] = np.exp(loc * t[mask] + zeta * scale * t[mask]) * (u[mask] ** (-alpha))
return out
def pearson3_cf(t, skew, loc=0.0, scale=1.0):
"""Characteristic function φ_X(t)."""
t = np.asarray(t, dtype=float)
skew = float(skew)
loc = float(loc)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
if abs(skew) < NORM2PEARSON_TRANSITION:
return np.exp(1j * loc * t - 0.5 * (scale * t) ** 2)
alpha, beta, zeta = pearson3_standard_params(skew)
return np.exp(1j * (loc + zeta * scale) * t) * (1.0 - 1j * (scale * t) / beta) ** (-alpha)
4) Moments & properties#
For \(X\sim\text{Pearson3}(\gamma,\mu,\sigma)\) in SciPy’s parameterization:
Quantity |
Value |
|---|---|
Mean |
\(\mathbb{E}[X]=\mu\) |
Variance |
\(\mathrm{Var}(X)=\sigma^2\) |
Skewness |
\(\mathrm{Skew}(X)=\gamma\) |
Excess kurtosis |
\(\gamma_2 = \dfrac{\mathbb{E}[(X-\mu)^4]}{\sigma^4}-3 = \dfrac{6}{\alpha}=\tfrac{3}{2}\gamma^2\) (for \(\gamma\neq 0\)) |
All moments? |
Yes (it’s a shifted/scaled Gamma) |
MGF / characteristic function#
Write \(Z=(X-\mu)/\sigma\).
If \(\gamma=0\): \(Z\sim\mathcal{N}(0,1)\), so \begin{aligned} M_X(t) &= \exp\left(\mu t + \tfrac{1}{2}\sigma^2 t^2\right),\ \varphi_X(t) &= \exp\left(i\mu t - \tfrac{1}{2}\sigma^2 t^2\right). \end{aligned}
If \(\gamma\neq 0\): using \(\alpha,\beta,\zeta\) from the definition, the standard MGF is \begin{aligned} M_Z(t) = \exp(\zeta t),(1 - t/\beta)^{-\alpha},\qquad 1 - t/\beta>0. \end{aligned} and the characteristic function is \begin{aligned} \varphi_Z(t) = \exp(i\zeta t),(1 - i t/\beta)^{-\alpha}. \end{aligned} Applying the location–scale transform \(X=\mu+\sigma Z\) gives \begin{aligned} M_X(t) &= \exp(\mu t),M_Z(\sigma t),\ \varphi_X(t) &= \exp(i\mu t),\varphi_Z(\sigma t). \end{aligned}
Entropy#
If \(\gamma=0\) (Normal): \(h(X)=\tfrac{1}{2}\log\bigl(2\pi e\sigma^2\bigr)\).
If \(\gamma\neq 0\): with \(\alpha=4/\gamma^2\) and \(\beta=2/\gamma\),
\begin{aligned} h(X) = \underbrace{\alpha + \log\Gamma(\alpha) + (1-\alpha),\psi(\alpha)}_{h(\text{Gamma}(\alpha,1))}
\log|\beta| + \log\sigma. \end{aligned}
# Compare closed-form moments to SciPy
skew_ex, loc_ex, scale_ex = 2.0, 10.0, 3.0
mean_theory = loc_ex
var_theory = scale_ex**2
skew_theory = skew_ex
kurt_excess_theory = 1.5 * skew_ex**2
mean_scipy, var_scipy, skew_scipy, kurt_excess_scipy = stats.pearson3.stats(
skew_ex, loc=loc_ex, scale=scale_ex, moments="mvsk"
)
(mean_theory, mean_scipy), (var_theory, var_scipy), (skew_theory, skew_scipy), (
kurt_excess_theory,
kurt_excess_scipy,
)
((10.0, 10.0), (9.0, 9.0), (2.0, 2.0), (6.0, 6.0))
5) Parameter interpretation#
SciPy uses three parameters:
skew = \gamma: controls asymmetry and induces a finite endpoint of the support.\(\gamma>0\) ⇒ long right tail with a lower bound.
\(\gamma<0\) ⇒ long left tail with an upper bound.
The implied Gamma shape is \(\alpha = 4/\gamma^2\).
Small \(|\gamma|\) ⇒ large \(\alpha\) ⇒ more symmetric (nearly Normal).
Large \(|\gamma|\) ⇒ small \(\alpha\) ⇒ more skewed/heavy-tailed.
loc = \mu: shifts the distribution; forpearson3, it is exactly the mean.scale = \sigma: rescales the distribution; it is exactly the standard deviation.
The finite endpoint#
For \(\gamma\neq 0\), the standardized endpoint is at \(\zeta=-2/\gamma\), so the endpoint for \(X\) is
\begin{aligned} \text{endpoint}(X)=\mu + \sigma\zeta = \mu - \frac{2\sigma}{\gamma}. \end{aligned}
This is a big practical difference versus, say, a skew-normal: the Pearson type III density is exactly zero beyond that endpoint.
# PDF shape as skew changes (loc=0, scale=1)
skews = [-2.0, -1.0, 0.0, 1.0, 2.0]
x = np.linspace(-6, 6, 1400)
fig = go.Figure()
for s in skews:
y = pearson3_pdf(x, s, loc=0.0, scale=1.0)
if abs(s) < NORM2PEARSON_TRANSITION:
label = f"skew={s:g} (Normal)"
else:
endpoint = pearson3_support(s, 0.0, 1.0)[0 if s > 0 else 1]
label = f"skew={s:g} (endpoint={endpoint:.2f})"
fig.add_trace(
go.Scatter(
x=x,
y=y,
mode="lines",
name=label,
line=dict(width=3),
)
)
fig.update_layout(
title="Pearson type III PDF for different skew values (loc=0, scale=1)",
xaxis_title="x",
yaxis_title="density",
)
fig.show()
# Location and scale effects (fix skew)
skew_fixed = 1.5
x = np.linspace(-6, 10, 1400)
# Vary loc
fig = go.Figure()
for loc in [-2.0, 0.0, 2.0]:
fig.add_trace(
go.Scatter(
x=x,
y=pearson3_pdf(x, skew_fixed, loc=loc, scale=1.0),
mode="lines",
name=f"loc={loc:g}",
line=dict(width=3),
)
)
fig.update_layout(
title=f"Effect of loc (mean) at skew={skew_fixed:g}, scale=1",
xaxis_title="x",
yaxis_title="density",
)
fig.show()
# Vary scale
fig = go.Figure()
for scale in [0.6, 1.0, 1.8]:
fig.add_trace(
go.Scatter(
x=x,
y=pearson3_pdf(x, skew_fixed, loc=0.0, scale=scale),
mode="lines",
name=f"scale={scale:g}",
line=dict(width=3),
)
)
fig.update_layout(
title=f"Effect of scale (std dev) at skew={skew_fixed:g}, loc=0",
xaxis_title="x",
yaxis_title="density",
)
fig.show()
6) Derivations#
Expectation#
Assume \(\gamma\neq 0\). By construction, \(Z = \zeta + Y/\beta\) with \(Y\sim\text{Gamma}(\alpha,1)\).
Because \(\mathbb{E}[Y]=\alpha\),
\begin{aligned} \mathbb{E}[Z] = \zeta + \frac{1}{\beta}\mathbb{E}[Y] = \zeta + \frac{\alpha}{\beta}. \end{aligned}
But \(\zeta = -\alpha/\beta\), so \(\mathbb{E}[Z]=0\). For \(X=\mu+\sigma Z\),
\begin{aligned} \mathbb{E}[X]=\mu + \sigma\mathbb{E}[Z]=\mu. \end{aligned}
Variance#
Since \(\mathrm{Var}(Y)=\alpha\) for \(Y\sim\text{Gamma}(\alpha,1)\),
\begin{aligned} \mathrm{Var}(Z) = \mathrm{Var}(Y/\beta) = \frac{\alpha}{\beta^2} = 1 \end{aligned}
because \(\alpha=\beta^2\). Then \(\mathrm{Var}(X)=\sigma^2\mathrm{Var}(Z)=\sigma^2\).
Likelihood (i.i.d. sample)#
Let \(x_1,\dots,x_n\) be i.i.d. from \(\text{Pearson3}(\gamma,\mu,\sigma)\) and set \(z_i=(x_i-\mu)/\sigma\).
If \(\gamma=0\) (Normal), the log-likelihood is the usual Normal log-likelihood.
If \(\gamma\neq 0\), define \(u_i = \beta(z_i-\zeta)\). The density is proportional to a Gamma density in \(u_i\), plus Jacobians:
\begin{aligned} \ell(\gamma,\mu,\sigma; x) &= \sum_{i=1}^n \log f_X(x_i) \ &= -n\log\sigma + n\log|\beta| - n\log\Gamma(\alpha)
(\alpha-1)\sum_{i=1}^n \log u_i - \sum_{i=1}^n u_i, \end{aligned}
with the feasibility constraint \(u_i>0\) for all \(i\) (otherwise \(\ell=-\infty\)).
That constraint is why pearson3 fitting can be numerically delicate: changing \((\gamma,\mu,\sigma)\) changes the endpoint of the support.
def pearson3_loglik(x, skew, loc=0.0, scale=1.0):
lp = pearson3_logpdf(x, skew, loc=loc, scale=scale)
if np.any(~np.isfinite(lp)):
return -np.inf
return float(np.sum(lp))
# Moment (plug-in) estimates: mean, std, skewness
skew_true, loc_true, scale_true = 1.2, -0.5, 2.0
x = stats.pearson3.rvs(skew_true, loc=loc_true, scale=scale_true, size=2000, random_state=rng)
loc_mom = float(x.mean())
scale_mom = float(x.std(ddof=0))
skew_mom = float(np.mean(((x - loc_mom) / scale_mom) ** 3))
params_mom = (skew_mom, loc_mom, scale_mom)
params_mle = stats.pearson3.fit(x) # (skew, loc, scale)
ll_true = pearson3_loglik(x, skew_true, loc_true, scale_true)
ll_mom = pearson3_loglik(x, *params_mom)
ll_mle = pearson3_loglik(x, *params_mle)
support_mom = pearson3_support(*params_mom)
{
"true": (skew_true, loc_true, scale_true, ll_true),
"mom": (params_mom, ll_mom, support_mom),
"mle": (params_mle, ll_mle),
}
{'true': (1.2, -0.5, 2.0, -3975.44243522909),
'mom': ((1.2499229604521087, -0.47616893803532, 2.0112388704572353),
-inf,
(-3.6943494725056225, inf)),
'mle': ((1.1819509816812133, -0.4762059470050811, 2.0047791374840935),
-3975.1236522369213)}
7) Sampling & simulation (NumPy-only)#
Because Pearson type III can be generated from a Gamma variable, sampling reduces to:
Sample \(Y\sim\text{Gamma}(\alpha,1)\).
Transform \(Z = \zeta + Y/\beta\).
Transform \(X = \mu + \sigma Z\).
So the real “work” is a Gamma sampler.
Marsaglia–Tsang (2000) sketch#
For \(\alpha\ge 1\) (shape), Marsaglia–Tsang samples a Normal proposal and uses a clever acceptance test. For \(\alpha<1\), you can sample from \(\alpha+1\) and apply a power correction:
\begin{aligned} Y_{\alpha} \overset{d}{=} Y_{\alpha+1},U^{1/\alpha},\qquad U\sim\text{Unif}(0,1). \end{aligned}
Below is a NumPy-only implementation.
def _gamma_rvs_mt_shape_ge_1(alpha, n, rng):
"""Marsaglia–Tsang sampler for Gamma(alpha, 1) with alpha >= 1."""
d = alpha - 1.0 / 3.0
c = 1.0 / np.sqrt(9.0 * d)
out = np.empty(n, dtype=float)
filled = 0
while filled < n:
m = n - filled
z = rng.normal(size=m)
v = (1.0 + c * z) ** 3
valid = v > 0
if not np.any(valid):
continue
z = z[valid]
v = v[valid]
u = rng.random(size=v.size)
accept = (u < 1.0 - 0.0331 * (z**4)) | (
np.log(u) < 0.5 * z**2 + d * (1.0 - v + np.log(v))
)
accepted = d * v[accept]
k = accepted.size
out[filled : filled + k] = accepted
filled += k
return out
def gamma_rvs_numpy(alpha, theta=1.0, size=1, rng=None):
"""Sample from Gamma(alpha, theta) using NumPy only."""
if rng is None:
rng = np.random.default_rng()
alpha = float(alpha)
theta = float(theta)
if alpha <= 0 or theta <= 0:
raise ValueError("alpha and theta must be > 0")
size_tuple = (size,) if isinstance(size, int) else tuple(size)
n = int(np.prod(size_tuple))
if alpha >= 1:
x = _gamma_rvs_mt_shape_ge_1(alpha, n, rng)
else:
y = _gamma_rvs_mt_shape_ge_1(alpha + 1.0, n, rng)
u = rng.random(size=n)
x = y * (u ** (1.0 / alpha))
return (x * theta).reshape(size_tuple)
def pearson3_rvs_numpy(skew, loc=0.0, scale=1.0, size=1, rng=None):
"""Sample from Pearson3(skew, loc, scale) using NumPy only."""
if rng is None:
rng = np.random.default_rng()
skew = float(skew)
loc = float(loc)
scale = float(scale)
if scale <= 0:
raise ValueError("scale must be > 0")
if abs(skew) < NORM2PEARSON_TRANSITION:
return rng.normal(loc=loc, scale=scale, size=size)
alpha, beta, zeta = pearson3_standard_params(skew)
y = gamma_rvs_numpy(alpha, theta=1.0, size=size, rng=rng)
z = zeta + y / beta
return loc + scale * z
# Smoke test: sample moments
skew_test, loc_test, scale_test = 1.5, -1.0, 2.0
s = pearson3_rvs_numpy(skew_test, loc=loc_test, scale=scale_test, size=120_000, rng=rng)
sample_mean = float(s.mean())
sample_var = float(s.var())
sample_skew = float(np.mean(((s - sample_mean) / np.sqrt(sample_var)) ** 3))
(sample_mean, loc_test), (sample_var, scale_test**2), (sample_skew, skew_test)
((-1.0038070626427584, -1.0),
(4.037192684901968, 4.0),
(1.5279894563295269, 1.5))
8) Visualization#
def ecdf(samples):
x = np.sort(np.asarray(samples))
y = np.arange(1, x.size + 1) / x.size
return x, y
skew_viz, loc_viz, scale_viz = 1.5, 0.0, 1.0
n_viz = 90_000
samples = pearson3_rvs_numpy(skew_viz, loc=loc_viz, scale=scale_viz, size=n_viz, rng=rng)
lb, ub = pearson3_support(skew_viz, loc_viz, scale_viz)
x_min = (lb + 1e-6 * scale_viz) if np.isfinite(lb) else float(np.quantile(samples, 0.001))
x_max = (ub - 1e-6 * scale_viz) if np.isfinite(ub) else float(np.quantile(samples, 0.995))
x_grid = np.linspace(x_min, x_max, 700)
# PDF + histogram
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=samples,
nbinsx=80,
histnorm="probability density",
name="Monte Carlo samples",
opacity=0.55,
)
)
fig.add_trace(
go.Scatter(
x=x_grid,
y=pearson3_pdf(x_grid, skew_viz, loc=loc_viz, scale=scale_viz),
mode="lines",
name="Theoretical PDF",
line=dict(width=3),
)
)
fig.update_layout(
title=f"Pearson3(skew={skew_viz:g}, loc={loc_viz:g}, scale={scale_viz:g}): histogram vs PDF",
xaxis_title="x",
yaxis_title="density",
bargap=0.02,
)
fig.show()
# CDF + empirical CDF
xs, ys = ecdf(samples)
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=x_grid,
y=pearson3_cdf(x_grid, skew_viz, loc=loc_viz, scale=scale_viz),
mode="lines",
name="Theoretical CDF",
line=dict(width=3),
)
)
fig.add_trace(
go.Scatter(
x=xs[::120],
y=ys[::120],
mode="markers",
name="Empirical CDF (subsampled)",
marker=dict(size=5),
)
)
fig.update_layout(
title=f"Pearson3(skew={skew_viz:g}, loc={loc_viz:g}, scale={scale_viz:g}): empirical CDF vs CDF",
xaxis_title="x",
yaxis_title="CDF",
)
fig.show()
9) SciPy integration (scipy.stats.pearson3)#
SciPy provides a ready-to-use implementation:
stats.pearson3.pdf(x, skew, loc=..., scale=...)stats.pearson3.cdf(x, skew, loc=..., scale=...)stats.pearson3.rvs(skew, loc=..., scale=..., size=...)stats.pearson3.fit(data)(MLE)
Below is a quick tour and a numerical cross-check against our formulas.
skew_s, loc_s, scale_s = 1.7, -0.2, 1.3
dist = stats.pearson3(skew_s, loc=loc_s, scale=scale_s)
# Cross-check PDF/CDF on a grid
lb, ub = pearson3_support(skew_s, loc_s, scale_s)
x_min = (lb + 1e-6 * scale_s) if np.isfinite(lb) else loc_s - 5 * scale_s
x_max = (ub - 1e-6 * scale_s) if np.isfinite(ub) else loc_s + 8 * scale_s
xg = np.linspace(x_min, x_max, 600)
max_pdf_err = float(np.max(np.abs(dist.pdf(xg) - pearson3_pdf(xg, skew_s, loc=loc_s, scale=scale_s))))
max_cdf_err = float(np.max(np.abs(dist.cdf(xg) - pearson3_cdf(xg, skew_s, loc=loc_s, scale=scale_s))))
max_pdf_err, max_cdf_err
(1.1102230246251565e-16, 0.0)
# Fit example (MLE)
skew_true, loc_true, scale_true = 1.2, -0.5, 2.0
x = stats.pearson3.rvs(skew_true, loc=loc_true, scale=scale_true, size=2500, random_state=rng)
skew_hat, loc_hat, scale_hat = stats.pearson3.fit(x)
{
"true": (skew_true, loc_true, scale_true),
"fit": (skew_hat, loc_hat, scale_hat),
}
{'true': (1.2, -0.5, 2.0),
'fit': (1.2304529907529282, -0.5751002226713251, 2.0022026684074685)}
10) Statistical use cases#
A) Hypothesis testing#
A common question is whether the data are “close enough” to Normal (skewness \(\approx 0\)) or whether a skewed model is warranted.
Because pearson3 changes its support when \(\gamma\neq 0\) (one-sided endpoint), the usual chi-square reference for likelihood-ratio tests is not reliable. A practical approach is a parametric bootstrap.
B) Bayesian modeling#
Pearson type III can be used as a likelihood for skewed measurement errors or residuals. It is not conjugate in general, but posteriors are easy to explore with:
grid evaluation (low-dimensional problems)
MCMC/VI (higher-dimensional problems)
C) Generative modeling#
In hydrology, a classic model is:
\begin{aligned} \log_{10}(Q) \sim \text{Pearson3}(\gamma,\mu,\sigma), \end{aligned}
which implies a log-Pearson III model for the flow \(Q\).
# A) Hypothesis testing via parametric bootstrap
def fit_norm_and_ll(x):
mu, sigma = stats.norm.fit(x)
ll = float(stats.norm.logpdf(x, loc=mu, scale=sigma).sum())
return mu, sigma, ll
def fit_p3_and_ll(x):
skew, loc, scale = stats.pearson3.fit(x)
ll = float(stats.pearson3.logpdf(x, skew, loc=loc, scale=scale).sum())
return skew, loc, scale, ll
# Observed sample (simulate under an alternative with skew != 0)
n = 400
skew_alt, loc_alt, scale_alt = 1.3, 0.0, 1.0
x_obs = stats.pearson3.rvs(skew_alt, loc=loc_alt, scale=scale_alt, size=n, random_state=rng)
mu0, sigma0, ll0 = fit_norm_and_ll(x_obs)
skew1, loc1, scale1, ll1 = fit_p3_and_ll(x_obs)
T_obs = 2 * (ll1 - ll0)
# Parametric bootstrap under H0: Normal with fitted params
B = 80 # increase for a more stable p-value
Ts = []
for _ in range(B):
xb = rng.normal(loc=mu0, scale=sigma0, size=n)
_, _, ll0b = fit_norm_and_ll(xb)
_, _, _, ll1b = fit_p3_and_ll(xb)
Ts.append(2 * (ll1b - ll0b))
Ts = np.asarray(Ts)
p_value = (1 + np.sum(Ts >= T_obs)) / (B + 1)
{
"T_obs": float(T_obs),
"bootstrap_p_value": float(p_value),
"p3_fit_skew": float(skew1),
}
{'T_obs': 134.65971186480692,
'bootstrap_p_value': 0.012345679012345678,
'p3_fit_skew': 1.4289216067243458}
# B) Bayesian modeling (grid posterior for loc) with fixed skew and scale
skew_fixed, scale_fixed = 1.2, 1.0
n = 60
loc_true = 1.5
# Simulated data
x = stats.pearson3.rvs(skew_fixed, loc=loc_true, scale=scale_fixed, size=n, random_state=rng)
# Prior: loc ~ Normal(0, 5^2)
prior_mu, prior_sigma = 0.0, 5.0
loc_grid = np.linspace(-2.0, 4.5, 700)
log_prior = stats.norm.logpdf(loc_grid, loc=prior_mu, scale=prior_sigma)
log_like = np.array(
[
stats.pearson3.logpdf(x, skew_fixed, loc=mu, scale=scale_fixed).sum()
for mu in loc_grid
],
dtype=float,
)
log_post = log_prior + log_like
log_post -= np.max(log_post)
post = np.exp(log_post)
post /= np.trapz(post, loc_grid)
# Posterior summaries
cdf = np.cumsum(post)
cdf /= cdf[-1]
loc_map = float(loc_grid[np.argmax(post)])
loc_lo = float(np.interp(0.025, cdf, loc_grid))
loc_hi = float(np.interp(0.975, cdf, loc_grid))
fig = go.Figure()
fig.add_trace(go.Scatter(x=loc_grid, y=post, mode="lines", line=dict(width=3)))
fig.add_vline(x=loc_true, line_dash="dash", line_color="black", annotation_text="true loc")
fig.add_vline(x=loc_map, line_dash="dot", line_color="firebrick", annotation_text="MAP")
fig.update_layout(
title=f"Posterior for loc (fixed skew={skew_fixed:g}, scale={scale_fixed:g})",
xaxis_title="loc",
yaxis_title="posterior density",
)
fig.show()
{"loc_true": loc_true, "loc_map": loc_map, "95%_CI": (loc_lo, loc_hi)}
{'loc_true': 1.5,
'loc_map': 1.552217453505007,
'95%_CI': (1.3589194706864023, 1.6688667024234018)}
# C) Generative modeling: log-Pearson III (hydrology-style)
# Model log10(Q) as Pearson3
skew_lp3, loc_lp3, scale_lp3 = 0.8, 3.0, 0.18
m = 50_000
log10Q = stats.pearson3.rvs(skew_lp3, loc=loc_lp3, scale=scale_lp3, size=m, random_state=rng)
Q = 10 ** log10Q
q50, q90, q99, q999 = np.quantile(Q, [0.5, 0.9, 0.99, 0.999])
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=Q,
nbinsx=120,
histnorm="probability density",
opacity=0.65,
name="Simulated Q",
)
)
fig.update_layout(
title="Log-Pearson III simulation: distribution of Q = 10^{log10(Q)}",
xaxis_title="Q",
yaxis_title="density",
)
fig.update_xaxes(type="log")
fig.show()
{"median": float(q50), "90%": float(q90), "99%": float(q99), "99.9%": float(q999)}
{'median': 948.662514114546,
'90%': 1730.2757084623959,
'99%': 3326.7075721286647,
'99.9%': 5740.577006565781}
11) Pitfalls#
Parameterization confusion: SciPy’s
skewis the skewness \(\gamma\), not the Gamma shape. The implied Gamma shape in the standard form is \(\alpha=4/\gamma^2\).Support depends on parameters: for \(\gamma\neq 0\) the distribution has a finite endpoint at \(\mu-2\sigma/\gamma\). Data outside support have zero likelihood.
Near-zero skew: the formulas involve \(1/\gamma\) and \(1/\gamma^2\). SciPy switches to a Normal approximation for \(|\gamma|\lesssim 1.6\times 10^{-5}\).
Fitting can be sensitive: because the support endpoint moves with the parameters, numerical optimization can be delicate (especially if data are close to the endpoint).
support()vs actual support:stats.pearson3.support(skew)reports \((-\infty,\infty)\), but the true support is one-sided when \(\gamma\neq 0\) (the PDF is exactly zero beyond the endpoint).
12) Summary#
pearson3is a shifted/scaled Gamma parameterized so thatskew = \gammaequals the distribution’s skewness.For \(\gamma\neq 0\), it has a finite endpoint at \(\mu-2\sigma/\gamma\) and a Gamma-like tail on the other side.
Mean and variance are simply
locandscale^2; excess kurtosis is \(\tfrac{3}{2}\gamma^2\).Sampling is straightforward by sampling a Gamma variable (Marsaglia–Tsang) and transforming it.
SciPy’s
stats.pearson3supportspdf,cdf,rvs, andfit, but be mindful of the moving-support pitfalls.
References#
SciPy documentation/source for
scipy.stats.pearson3Marsaglia, G. & Tsang, W. W. (2000). A Simple Method for Generating Gamma Variables.
Pearson system overview (Pearson distribution types I–VII)
Hydrology practice: log-Pearson type III for flood frequency (agency/standard guidance varies by region)